Skip to content

ci(skills-eval): wire schedule to MANUAL_FULL_SWEEP + sync dispatch picker#701

Merged
shubhadeepd merged 1 commit into
mainfrom
ci/skills-eval-schedule-fix
Jun 30, 2026
Merged

ci(skills-eval): wire schedule to MANUAL_FULL_SWEEP + sync dispatch picker#701
shubhadeepd merged 1 commit into
mainfrom
ci/skills-eval-schedule-fix

Conversation

@richa-nvidia

@richa-nvidia richa-nvidia commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Two independent fixes in .github/workflows/skills-eval.yml:

  1. Schedule trigger now exports MANUAL_FULL_SWEEP=1 (same as workflow_dispatch already does). Before this, nightly runs crashed with FATAL: PR_NUMBER not set in environment — see GHA run 28214488174.

  2. workflow_dispatch picker now lists the 3 skills that actually exist on disk (rag-blueprint, rag-eval, rag-perf) instead of 10 phantom names left over from an old taxonomy. The LLM already enumerates skills/*/eval/*.json itself, so the stale picker was a UX trap — anyone picking an old name got a silent BLOCKED. Closes the gotcha documented at ONBOARDING.md:487-492.

Local verification (all green)

  • YAML + workflow lint: parses clean, picker has exactly 4 options (* + 3 real skills).
  • Bash conditional simulation: confirmed event_name=schedule now sets MANUAL_FULL_SWEEP=1, MANUAL_SKILLS_FILTER=*; event_name=push still leaves them unset.
  • Python agent dry-run: reproduced the old FATAL: PR_NUMBER without the fix; with the fix the agent gets past the _require gate.

Why this is independent of #680

PR #680 fixed Brev auth + cleanup. Both passed cleanly on run 28214488174 (no GPU leak). This PR fixes the next failure exposed once cleanup stopped silently swallowing errors.

Test plan

  • Trigger workflow_dispatch on this branch — confirm picker only shows *, rag-blueprint, rag-eval, rag-perf, and the agent reaches [agent] starting instead of FATAL: PR_NUMBER.
  • First nightly after merge (2026-06-27 02:00 UTC) — confirm no FATAL: PR_NUMBER in logs.
  • After 7 days — audit brev ls for any leaked rag-eval-gpu-* instances (still expected: zero).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Scheduled runs now use the same skill-selection behavior as manual runs, defaulting to all skills when no selection is provided.
    • The skill selector list was simplified to the currently supported options while still allowing all skills to be selected.

…icker

Two independent fixes in one file:

1. Schedule trigger now exports MANUAL_FULL_SWEEP=1 (like
   workflow_dispatch already does). Before this, the schedule event
   fell through to the PR-mode code path in skills_eval_agent.py and
   died with "FATAL: PR_NUMBER not set in environment". Also default
   MANUAL_SKILLS_FILTER to "*" when INPUT_SKILLS is empty (schedule
   events don't populate `inputs`).

2. workflow_dispatch picker now lists the 3 skills that actually exist
   on disk (rag-blueprint, rag-eval, rag-perf) instead of 10 phantom
   names (rag-deploy-blueprint, rag-ingest-documents, etc.) that were
   left over from an old skill taxonomy. The LLM agent already
   enumerates skills/*/eval/*.json itself, so the stale picker was a
   UX trap only — anyone picking one of the old names got a silent
   BLOCKED result. ONBOARDING.md:487-492 documents this as a known
   gotcha; this PR closes the gotcha.

Discovered while diagnosing GHA run 28214488174 (first nightly after
PR #680). The Brev auth step and cleanup step from #680 both passed
cleanly — this is a separate, pre-existing bug surfaced by them
running on schedule for the first time.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: richa <ricsingh@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 29, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: b25f418a-3ffe-4e87-a746-815650476b56

📥 Commits

Reviewing files that changed from the base of the PR and between 3241f06 and 7208159.

📒 Files selected for processing (1)
  • .github/workflows/skills-eval.yml

📝 Walkthrough

Walkthrough

The skills-eval workflow trims the workflow_dispatch skill choices to rag-blueprint, rag-eval, and rag-perf. The agent run script's event gating is extended to cover both workflow_dispatch and schedule, setting MANUAL_SKILLS_FILTER from INPUT_SKILLS with a fallback to * for schedule events that carry no inputs.

Changes

Skills-eval Workflow

Layer / File(s) Summary
Skill choices and filter unification
.github/workflows/skills-eval.yml
Removes extra rag-* options from the skills dispatch input, keeping only rag-blueprint, rag-eval, and rag-perf; extends the MANUAL_SKILLS_FILTER assignment to schedule events with INPUT_SKILLS:-* fallback.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🐇 Hop hop, fewer choices to chase,
Three rag skills now run the race.
Schedule joins the dispatch crew,
Filter stars when inputs are few.
Clean and neat, the workflow's ace! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the two main workflow fixes: scheduling now uses MANUAL_FULL_SWEEP and the dispatch picker was synced.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ci/skills-eval-schedule-fix

Comment @coderabbitai help to get the list of available commands.

@faaranm faaranm requested a review from shubhadeepd June 29, 2026 16:02
@shubhadeepd shubhadeepd merged commit 34cf9c8 into main Jun 30, 2026
10 of 11 checks passed
@shubhadeepd shubhadeepd deleted the ci/skills-eval-schedule-fix branch June 30, 2026 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants